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Abstract 

A distributed algorithm is described for solving a linear algebraic equation of the form Ax = b 
assuming the equation has at least one solution. The equation is simultaneously solved by to agents 
assuming each agent knows only a subset of the rows of the partitioned matrix [A 6], the current 
estimates of the equation’s solution generated by its neighbors, and nothing more. Each agent recursively 
updates its estimate by utilizing the current estimates generated by each of its neighbors. Neighbor 
relations are characterized by a time-dependent directed graph N(t) whose vertices correspond to agents 
and whose arcs depict neighbor relations. It is shown that for any matrix A for which the equation has 
a solution and any sequence of “repeatedly jointly strongly connected graphs” N (t), t = 1 . 2 ,..., the 
algorithm causes all agents’ estimates to converge exponentially fast to the same solution to Ax = b. It 
is also shown that the neighbor graph sequence must actually be repeatedly jointly strongly connected 
if exponential convergence is to be assured. A worst case convergence rate bound is derived for the case 
when Ax = b has a unique solution. It is demonstrated that with minor modification, the algorithm can 
track the solution to Ax = b, even if A and b are changing with time, provided the rates of change of A 
and b are sufficiently small. It is also shown that in the absence of communication delays, exponential 
convergence to a solution occurs even if the times at which each agent updates its estimates are not 
synchronized with the update times of its neighbors. A modification of the algorithm is outlined which 
enables it to obtain a least squares solution to Ax = b in a distributed manner, even if Ax = b does 
not have a solution. 
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Index Terms 
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I. Introduction 

Certainly the most well known and probably the most important of all numerical computations 
involving real numbers is solving a system of linear algebraic equations. Efforts to develop 
distributed algorithms to solve such systems have been under way for a long time especially 
in the parallel processing community where the main objective is to achieve efficiency by 
somehow decomposing a large system of linear equations into smaller ones which can be solved 
on parallel processers more accurately or faster than direct solution of the original equations 
would allow 0-0. In some cases, notably in sensor networking 0, I® and some filtering 
applications 0, the need for distributed processing arises naturally because processors onboard 
sensors or robots are physically separated from each other. In addition, there are typically 
communication constraints which limit the flow of information across a robotic or sensor network 
and consequently preclude centralized processing, even if efficiency is not the central issue. It 
is with these thoughts in mind that we are led to consider the following problem. 

II. The Problem 

We are interested in a network of m > 1 {possibly mobile} autonomous agents which are 
able to receive information from their “neighbors” where by a neighbor of agent i is meant any 
other agent within agent is reception range. We write M,(t) for the labels of agent V s neighbors 
at time t, and we always take agent i to be a neighbor of itself. Neighbor relations at time t can 
be conveniently characterized by a directed graph N(t) with m vertices and a set of arcs defined 
so that there is an arc in N(t) from vertex j to vertex i just in case agent j is a neighbor of 
agent i at time t. Thus the directions of arcs represent the directions of information flow. Each 
agent i has a real-time dependent state vector Xi(t) taking values in M n , and we assume that 
the information agent i receives from neighbor j at time t is x 3 (t). We also assume that agent 
i knows a pair of real-valued matrices (A™ iXn , b™ iXl ). The problem of interest is to devise local 
algorithms, one for each agent, which will enable all m agents to iteratively and asynchronously 
compute solutions to the linear equation Ax = b where A = column {A 1 , A 2 ,..., A m } nxn , 
b = column {b\, b 2 , ..., b m } nxn and n = Y1T=\ n i- We shall require these solutions to be exact 
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up to numerical round off and communication errors. In the first part of this paper we will focus 
on the synchronous case and we will assume that Ax = b has a solution although we will not 
require it to be unique. A restricted version of the asynchronous problem in which communication 
delays are ignored, is addressed in Willi a more general version of the asynchronous problem 
in which communication delays are explicitly taken into account, is treated in lITOl . 

The problem just formulated can be viewed as a distributed parameter estimation problem 
in which the bi are observations available to the sensors and x is a parameter to be estimated. 
In this setting, the observation equations are sometimes of the form bi = AiX + //, where rj t 
is a term modeling measurement noise |J8). The most widely studied version of the problem 
is when m = n, the A, are linearly independent row vectors a,, the bi are scalars, and N (t) 
is a constant, symmetric and strongly connected graph. For this version of the problem, A is 
therefore an n x n nonsingular matrix, b is an n vector and agent i knows the state x :j (t) of 
each of its neighbors as well as its own state. The problem in this case is thus for each agent i 
to compute A~ 1 b, given a,, bi and x 3 (t ), j e J\f t , t > 0. In this form, there are several classical 
parallel algorithms which address closely related problems. Among these are Jacobi iterations 
El, so-called “successive over-relaxations” 0 and the classical Kaczmart method [j6l. Although 
these are parallel algorithms, all rely on “relaxation factors” which cannot be determined in 
a distributed way unless one makes special assumptions about A. Additionally, the implicitly 
defined neighbor graphs for these algorithms are generally strongly complete; i.e., all processors 
can communicate with each other. 

This paper breaks new ground by providing an algorithm which is 

1) applicable to any pair of real matrices (A, b) for which Ax = b has at least one solution. 

2) capable of finding a solution at least exponentially fast {Theorem [T]}. 

3) applicable to the largest possible class of time-varying directed neighbor graphs N(t) for 
which exponential convergence can be assured {Theorem [2]}. 

4) capable of finding a solution to Ax = b which, in the absence of round off and commu¬ 
nication errors, is exact. 

5) capable of finding a solution using at most an n dimensional state vector received at each 
clock time from each of its neighbors. 

6) applicable without imposing restrictive or unrealistic requirements such as (a) the assump¬ 
tion that each agent is constantly aware of an upper bound on the number of neighbors 
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of each of its neighbors or (b) the assumption that all agents are able to share the same 
time-varying step size. 

7) capable of operating asynchronously. 

An obvious approach to the problem we’ve posed is to reformulate it as a distributed opti¬ 
mization problem and then try to use existing algorithms such as those in lUTfl - PTl to obtain a 
solution. Despite the fact that there is a large literature on distributed optimization, we are not 
aware of any distributed optimization algorithm which, if applied to the problem at hand, would 
possess all of the attributes mentioned above, even if the capability of functioning asynchronously 
were not on the list. For the purpose of solving the problem of interest here, existing algorithms 
are deficient in various ways. Some can only find approximate solutions with bounded errors 
ED; some are only applicable to networks with bi-directional communications {ie, undirected 
graphs} and/or networks with fixed graph topologies [|T2l - llT4l . iflTll ; many require all agents to 
share a common, time varying step size lfl2l . |[l4l - llT9l : many introduce an additional scalar or 
vector state Ifl3l . Ifl4l . Ifl6l . Ifl8l - |[2T1 for each agent to update and transmit; none have been 
shown to generate solutions which converge exponentially fast, although it is plausible that some 
may exhibit exponential convergence when applied to the type of quadratic optimization problem 
one would set up to solve the linear equation which is of interest here. 

One limitation common to many distributed optimization algorithms is the requirement that 
each agent must be aware of an upper bound on the number of neighbors of each of its neighbors. 
This means that there must be bi-directional communications between agents. This requirement 
can be quite restrictive, especially if neighbor relations change with time. The requirement stems 
from the fact that most distributed optimization algorithms depend on some form of “distributed 
averaging.” Distributed averaging is a special type of consensus seeking for which the goal is for 
all n agents to ultimately compute the average of the initial values of their consensus variables. 
In contrast, the goal of consensus seeking is for all agents to ultimately agree on a common 
value of their consensus variable, but that value need not be the average of their initial values. 
Because distributed averaging is a special form of consensus seeking, the methods used to obtain 
a distributed average are more specialized than those needed to reach a consensus. There are 
three different approaches to distributed averaging: linear iterations 0, 11221 - gossiping Il23l . 
[|24l . and double linear iterations lf25l which are also known as push-sum algorithms lTl6l . Il26l . 
11271 and scaled agreement algorithms ll28l . 
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Linear iterations for distributed averaging can be modeled as a linear recursion equation in 
which the {possibly time-varying} update matrix must be doubly stochastic ll23l . The doubly 
stochastic matrix requirement cannot be satisfied without assuming that each agent knows an 
upper bound on the number of neighbors of each of its neighbors. A recent exception to this is 
the paper ||29l where the idea is to learn weights within the requisite doubly stochastic matrix in 
an asymptotic fashion. Although this idea is interesting, it also adds complexity to the distributed 
averaging process; in addition, its applicability is limited to time invariant graphs. 

Gossiping is a very widely studied approach to distributed averaging in which each agent 
is allowed to average its consensus variable with at most one other agent at each clock time. 
Gossiping protocols can lead to deadlock unless specific precautions are taken to insure that 
they do not and these precautions generally lead to fairly complex algorithms li24l unless one 
is willing to accept probabilistic solutions. 

Push-sum algorithms are based on a quite clever idea first apparently proposed by in ll26l . 
Such algorithms are somewhat more complicated than linear iterations, and generally require 
more data to be communicated between agents. They are however attractive because, at least for 
some implementations, the requirement that each agent know the number of neighbors of each 
of its neighbors is avoided ll25l . 

Another approach to the problem we have posed is to reformulate it as a least squares problem. 
Distributed algorithms capable of solving the least squares problem have the advantage of being 
applicable to Ax = b even when this equation has no solution. The authors of 11301. Il3~lll develop 
several algorithms for solving this type of problem and give sufficient conditions for them 
to work correctly; a limitation of their algorithms is that each agent is assumed to know the 
coefficient matrix Aj of each of its neighbors. In ll32l . it is noted that the distributed least 
squares problem can be solved by using distributed averaging to compute the average of the 
matrix pairs (A'Aj,A'6j). The downside of this very clever idea is that the amount of data to 
be communicated between agents does not scale well as the number of agents increases. In < fIXl 
of this paper an alternative approach to the distributed least squares problem is briefly outlined; 
it too has scaling problems, but also appears to have the potential of admitting a modification 
which will to some extent overcome the scaling problem. 

Yet another approach to the problem of interest in this paper, is to view it as a consensus 
problem in which the goal is for all m agents to ultimately attain the same value for their states 
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subject to the requirement that each Xi satisfies the convex constraint /l r x, : = b,. An algorithm for 
solving a large class of constrained consensus problems of this type in a synchronous manner, 
appears in |fT51 . Specialization of that algorithm to the problem of interest here, yields an 
algorithm similar to synchronous version of the algorithm which we will consider. The principle 
difference between the two - apart from correctness proofs and claims about speed of convergence 
- is that the algorithm stemming from [fl5l is based on distributed averaging and consequently 
relies on convergence properties of doubly stochastic matrices whereas the synchronous version 
of the algorithm developed in this paper does not. As a consequence, the algorithm stemming 
from [|T5l cannot be implemented without assuming that each agent knows as a function of time, 
at least an upper bound on the number of neighbors of each of its current neighbors, whereas the 
algorithm under consideration here can. Moreover, limiting the consensus updates to distributed 
averaging via linear iterations almost certainly limits the possible convergence rates which might 
be achieved, were one not constrained by the special structure of doubly stochastic matrices. We 
see no reason at all to limit the algorithm we are discussing to doubly stochastic matrices since, 
as this paper demonstrates, it is not necessary to. In addition, we mention that a convergence 
proof for the constrained consensus algorithm proposed in lfT5l which avoids doubly stochastic 
matrices is claimed to have been developed in lf33l but the correctness of the proof presented 
there is far from clear. 

Perhaps the most important difference between the results of lfl5l and the results to be 
presented here concerns speed of convergence. In this paper exponential convergence is estab¬ 
lished for any sequence of repeatedly strongly connected neighbor graphs. In lfT5l . asymptotic 
convergence is proved under the same neighbor graph conditions, but exponential convergence is 
only proved in the special case when the neighbor graph is fixed and complete. It is not obvious 
how to modify the analysis given in lfl5l to obtain a proof of exponential convergence under 
more relaxed conditions. 

In contrast with earlier work on distributed optimization and distributed consensus, the ap¬ 
proach taken in this paper is based on a simple observation, inspired by [|T5l . which has the 
potential on being applicable to a broader class of problems than being considered here. Suppose 
that one is interested in devising a distributed algorithm which can cause all members of a group 
of m agents to find a solution x to the system of equations a.i(x) = 0, *£{1,2,..., m} where 
a* : M n — > M"* is a “private” function know only to agent i. Suppose each agent i is able to 
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find a solution x % to its private equation a^x*) = 0, and in addition, all of the x % are the same. 
Then all Xj must satisfy ay(xj) = 0, j & {1,2,... ,m} and thus each constitutes a solution to 
the problem. Therefore to solve such a problem, one should try to craft an algorithm which, 
on the one hand causes each agent to satisfy its own private equation and on the other causes 
all agents to reach a consensus. We call this the agreement principle. We don’t know if it has 
been articulated before although it has been used before without special mention lf34l . As we 
shall see, the agreement principle is the basis for three different versions of the problem we are 
considering. 


III. The Algorithm 

Rather than go through the intermediate step of reformulating the problem under consideration 
as an optimization problem or as a constrained consensus problem, we shall approach the problem 
directly in accordance with the agreement principle. This was already done in ll34l for the case 
when neighbors do not change and the algorithm obtained was the same one as the one we are 
about to develop here. Here is the idea assuming that all agents act synchronously. Suppose time 
is discrete in that t takes values in {1,2,...}. Suppose that at each time t > 1, agent i picks 

as a preliminary estimate of a solution to Ax = b, a solution Zi(t) to A % x = bi. Suppose that 

K, is a basis matrix for the kernel of A,. If we set x t (1) = ^(1) and restrict the updating of 
Xi(t) to iterations of the form Xj(£ + 1) = Zi(t) + KiUi(t), t > 1, then no matter what Ui(t) is, 
each Xj(f) will obviously satisfy AjXj(f) = bi, t > 1. Thus, in accordance with the agreement 
principle, all we need to do to solve the problem is to come up with a good way to choose the 
Ui so that a consensus is ultimately reached. Capitalizing on what is known about consensus 

algorithms [|35l - [|T7l . one would like to choose w;(f) so that Xi(t + 1) = W) 

where rrii(t ) is the number of neighbors of agent i at time t, but this is impossible to do 
because — Zi(t ) + x j(t) * s not typically in the image of A',:. So instead one might 

try choosing each rq(f) to minimize the difference (zi(t) + KiUi(t )) — A/W) i n 

the least squares sense. Thus the idea is to choose Xj(£ + 1) to satisfy AjXj(f + 1) — b, while 
at the same time making Xj(f + 1) approximately equal to the average of agent V s neighbors’ 
current estimates of the solution to Ax = b. Doing this leads at once to an iteration for agent i 
of the form 
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Xi(t + 1 ) = Zi{t) 


rrii(t) 


rrii(t)zi(t) - E Ad) 


t > 1 


(1) 


where Pi is the readily computable orthogonal projection on the kernel of Ai. Note right away that 
the algorithm does not involve a relaxation factor and is totally distributed. While the intuition 
upon which this algorithm is based is clear, the algorithm’s correctness is not. 

It is easy to see that (/ — P^z^t) is fixed no matter what z t (t) is, just so long as it is a 
solution to AiX = . Since Xi(t) is such a solution, ([B can also be written as 


Xi(t + 1 ) = Xi{t) 


rrii(t) 


Pi mi(t)xi(t) 


J2 

jeATi(t) 


t > 1 


( 2 ) 


and it is this form which we shall study. Later in TVIII when we focus on a generalization of 
the problem in which A and b change slowly with time, the corresponding generalizations of 0 
and (IB are not quite equivalent and it will be more convenient to focus on the generalization 
corresponding to (HI). 

As mentioned in the preceding section, by specializing the constrained consensus problem 
treated in lfl5ll to the problem of interest here, one can obtain an update rule similar to ©• 
Thus the arguments in lfT5ll can be used to establish asymptotic convergence in the case of 
synchronous operation. Of course using the powerful but lengthy and intricate proofs developed 
in lfl5ll to address the specific constrained consensus problem posed here, would seem to be a 
round about way of analyzing the problem, were there available a direct and more transparent 
method. One of the main contributions of this paper is to provide just such a method. The 
method closely parallels the well-known approach to unconstrained consensus problems based 
on nonhomogeneous Markov chains lf36ll . Il38l . The standard unconstrained consensus problem is 
typically studied by looking at the convergence properties of infinite products of S mxm stochastic 
matrices. On the other hand, the problem posed in this paper is studied by looking at infinite 
products of matrices of the form P(S 0 I)P where P is a block diagonal matrix of m, n x n 
orthogonal matrices, S is an m x m stochastic matrix, / is the n x n identity, and 0 is the 
Kronecker product. For the standard unconstrained consensus problem, the relevant measure of 
the distance of a stochastic matrix S from the desired limit of a rank one stochastic matrix is 
the infinity matrix semi-norm lf24l which is also the same thing as the well known coefficient of 
ergodicity ll38l . For the problem posed in this paper, the relevant measure of the distance of a 
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matrix of the form P(S <8> I)P from the desired limit of the zero matrix, is a somewhat unusual 
but especially useful concept called a “mixed-matrix” norm - 1VI-A1 

IV. Organization 

The remainder of this paper is organized as follows. The discrete-time synchronous case is 
treated first. We begin in Section [V] by stating conditions on the sequence of neighbor graphs 
N(1),N(2),... encountered along a “trajectory,” for the overall distributed algorithm based on 
© to converge exponentially fast to a solution to Ax = b. The conditions on the neighbor graph 
sequence are both sufficient {Theorem [I]} and necessary {Theorem [2]}. A worst case geometric 
convergence rate is then given {Corollary [Tj} for the case when Ax = b has a unique solution. 

Analysis of the synchronous case is carried out in Wli After developing the relevant linear 
iteration ®, attention is focused in ! IVI-AI on proving that repeatedly jointly strongly connected 
neighbor graph sequences are sufficient for exponential convergence. For the case when Ax = b 
has a unique solution, the problem reduces to finding conditions {Theorem [3]} on an infinite 
sequence of mxm stochastic matrices S\, S 2 , ■ ■ . with positive diagonals under which an infinite 
sequence of matrix products of the form (P(Sk®I)P)(P{Sk-i® I)P) ■ • • (P(Si® I)P), k > 1 
converges to the zero matrix. The problem is similar to problem of determining conditions on an 
infinite sequence of mxm stochastic matrices S\, S 2 ,... with positive diagonals under which 
an infinite sequence of matrix products of the form (SkSk-i ■ ■ • Sj), k > 1 converges to a rank 
one stochastic matrix. The latter problem is addressed in the standard consensus literature by 
exploiting several facts: 

1) The induced infinity matrix semi-norm Il24l {i.e., the coefficient of ergodicity 11381 } is 
sub-multiplicative on the set of mxm stochastic matrices. 

2) Every finite product of stochastic matrices is non-expansive in the induced infinity matrix 
semi-norm ll24l . 

3) Every sufficiently long product of stochastic matrices with positive diagonals is a semi¬ 
contraction in the infinity semi-norm provided the graphs of the stochastic matrices ap¬ 
pearing in the product are all rootec[] ll24l . Il35l . Il39l . 

1 A directed graph is rooted if it contains at least one vertex r from which, for each vertex v in the graph, there is a directed 
path from r to v . 
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There are parallel results for the problem of interest here: 

1) The mixed matrix norm defined by (flOl) is sub-multiplicative on M mrixmn {Lemma [3]f. 

2) Every finite matrix product of the form (P(Sk <S) I)P)(P(Sk~ i 0 I)P) ■ ■ ■ ( P{S q 0 I)P ) 
is non-expansive in the mixed matrix norm {Proposition Q]}. 

3) Every sufficiently product of such matrices is a contraction in the mixed matrix norm 
provided the stochastic matrices appearing in the product have positive diagonals and 
graphs which are all strongly connected {Proposition [2]}. 

While there are many similarities between the consensus problem and the problem under 
consideration here, one important difference is that the set of m x m stochastic matrices is 
closed under multiplication whereas the set of matrices of the form ( P(S 0 I)P) is not. To 
deal with this, it is necessary to introduce the idea of a “projection block matrix” HVI-C2I 
A projection block matrix is a partitioned matrix whose specially structured blocks are called 
“projection matrix polynomials” TVI-Cll What is important about this concept is that the set of 
projection block matrices is closed under multiplication and contains every matrix product of the 
form ( P(Sk0l)P)(P(Sk-i0l)P ) • • • (P(S q 01)P). Moreover, it is possible to give conditions 
under which a projection block matrix is a contraction in the mixed matrix norm {Proposition 
m- Specialization of this result yields a characterization of the class of matrices of the form 
( P(Sk 0 I)P)(P(Sk ~i 0 I)P) ■ ■ • ( P(S q 0 I)P ) which are contractions { Proposition [2]}. This, 
in turn is used to prove Theorem [3] which is the main technical result of the paper. 

The proof of Theorem Q] is carried out in two steps. The case when Ax = b has a unique 
solution is treated first. Convergence in this case is an immediate consequence of Theorem [3] 
The general case without the assumption of uniqueness is treated next. In this case, Lemma Q] 
is used to decompose the problem into two parts - one to which the results for the uniqueness 
case are directly applicable and the other to which standard unconstrained consensus results are 
applicable. 

It is well known that the necessary condition for a standard unconstrained consensus algo¬ 
rithm to generate an exponentially convergent solution is that the sequence of neighbor graphs 
encountered be “repeatedly jointly rooted” lf40l . Since a “repeatedly jointly strongly connected 
sequence” is always a repeatedly jointly rooted sequence, but not conversely, it may at first 
glance seem surprising that for the problem under consideration in this paper, repeatedly jointly 
strongly connected sequences are in fact necessary for exponential convergence. Nonetheless 
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they are and a proof of this claim is given in Section IVI-B1 The proof relies on the concept of 
an “essential vertex” as well as the idea of “a mutual reachable equivalence class.” These ideas 
can be found in If38l and iHTII under different names. 

Theorem [3] is proved in HVI-CI The proof relies heavily on a number of concepts mentioned 
earlier including the mixed matrix norm, projection matrix polynomials { dVI-C 1 1 }. and projection 
block matrices { HVI-C2I} . These concepts also play an important role in ' IVI-DI where the 
worst case convergence rate stated in Corollary Q] is justified. To underscore the importance of 
exponential convergence, it is explained in Will why that with minor modification, the algorithm 
we have been considering can track the solution to Ax = b, if A and b are changing with time, 
provided the rates of change of A and b are sufficiently small. Finally, the asynchronous version 
of the problem is addressed in Section IVIII1 

A limitation of the algorithm we have been discussing is that it is only applicable to linear 
equations for which there are solutions. In TlXl we explain how to modify the algorithm so that 
it can obtain least squares solutions to Ax = b even in the case when Ax = b does not have 
a solution. As before, we approach the problem using standard consensus concepts rather than 
the more restrictive concepts based on distributed averaging. 

A. Notation 

If M is a matrix, M. denotes its column span. If n is a positive integer, n = {1,2,..., n}. 
Throughout this paper Q sa denotes the set of all directed graphs with m vertices which have 
self-arcs at all vertices. The graph of an m x m matrix M with nonnegative entries is an m 
vertex directed graph 7 (M) defined so that (i,j) is an arc from i to j in the graph just in case 
the jith entry of M is nonzero. Such a graph is in Q sa if and only if all diagonal entries of M 
are positive. 


V. Synchronous Operation 

Obviously conditions for convergence of the m iterations defined by ([2]) must depend on 
neighbor graph connectivity. To make precise just what is meant by connectivity in the present 
context, we need the idea of “graph composition” lf35ll . By the the composition of a directed 
graph G p e Q sa with a directed graph G q e Q sa , written G q o G p is meant that directed graph in 
Q sa with arc set defined so that (z, 7 ) is an arc in the composition just in case there is a vertex 
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k such that (i, k ) is an arc in G p and (k,j) is an arc in G g . It is clear that Q sa is closed under 
composition and composition is an associative binary operation; because of this, the definition 
extends unambiguously to any finite sequence of directed graphs in Q sa . Composition is defined 
so that for any pair of nonnegative mxm matrices Mi,M 2 , with graphs y(M\), y{M 2 ) G Q sa , 
7(M 2 Mi) =7(M 2 ) 07(Mi). 

To proceed, let us agree to say that an infinite sequence of graphs Gi,G 2) ... in Q sa is 
repeatedly jointly strongly connected , if for some finite positive integers l and r 0 and each 
integer k > 0 , the composed graph = G w+7D _i o G H+T0 _ 2 0 ■■■ °G (fc _iy +ro , is strongly 
connected. Thus if Ni,N 2) ... is a sequence of neighbor graphs which is repeatedly jointly 
strongly connected, then over each successive interval of l consecutive iterations starting at To, 
each proper subset of agents receives some information from the rest. The first of the two main 
results of this paper for synchronous operation is as follows. 

Theorem 1: Suppose each agent i updates its state Xi(t) according to rule ©. If the sequence 
of neighbor graphs N (t), t > 1 , is repeatedly jointly strongly connected, then there exists a 
positive constant A < 1 for which all Xi(t) converges to the same solution to Ax = b as t — > 00, 
as fast as A* converges to 0 . 

In the next section we explain why this theorem is true. 

The idea of a repeatedly jointly strongly connected sequence of graphs is the direct analog 
of the idea of a “repeatedly jointly rooted” sequence of graphs; the repeatedly jointly rooted 
condition, which is weaker than the repeatedly jointly strongly connected condition, is known to 
be not only a sufficient condition but also a necessary one on an infinite sequence of neighbor 
graphs in Q sa for all agents in an unconstrained consensus process to reach a consensus exponen¬ 
tially fast li40l . The question then, is repeatedly jointly strongly connected strong connectivity 
necessary for exponential convergence of the Xi to a solution to Ax = 6 ? Obviously such a 
condition cannot be necessary in the special case when A = 0 and {and consequently 6 = 0} 
because in the case the problem reduces to an unconstrained consensus problem. The repeatedly 
jointly strongly connected condition also cannot be necessary if a distributed solution to Ax = h 
can be obtained by only a proper subset of the full set of m agents. Prompted by this, let us 
agree to say that agents with labels in V = {i\,i 2 , C m are redundant if any solution 

to the equations A t x = b, for all i in the complement of V, is a solution to Ax = b. To derive 
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an algebraic condition for redundancy, suppose that z is a solution to Ax = b. Write V for the 
complement of V in m. Then any solution w to the equations A t x = bi,i E V must satisfy 
w — z E D ie y'Pi, where for i E m, V t = image P % . Thus agents with labels in V will be 
redundant just in case w — z E Therefore agents with labels in V will be redundant if 

and only if 

n a c n ^ 

zGV 

We say that {P 1 , P 2 ,..., P m } is a non-redundant set if no such proper subset exists. We can 
now state the second main result of this paper for synchronous operation. 

Theorem 2: Suppose each agent i updates its state Xi(t) according to rule ©. Suppose in 
addition that A ^ 0 and that { l\, P 2 ,..., P, n } is a non-redundant set. If there exists a positive 
constant A < 1 for which all Xi(t) converges to the same solution to Ax = b as t —> oo as fast 
as A' converges to 0, then the sequence of neighbor graphs N(t), t > 1, is repeatedly jointly 
strongly connected. 

In the TVI-BI we explain why this theorem is true. 

For the case when Ax = b has a unique solution and each of the neighbor graphs N(£), t > 1 
is strongly connected, it is possible to derive an explicit worst case bound on the rate at which 
the Xi converge. As will be explained at the beginning of HVI-A1 the uniqueness assumption 
is equivalent to the assumption that = 0. This and Lemma [2] imply that the induced 

two-norm | • | 2 of any finite product of the form Pj 1 Pj 2 ■ ■ ■ Pj k is less than 1, provided each of 
the Pi, i E m, occur in the product at least once. Thus if q = [m — l) 2 and C is the set of all 
such products of length q + 1, then C is compact and 


P ~ max \P jl Pj 2 ■ ■ ■ Pj q+1 \-2 


(3) 


is a number less than 1. So therefore is 

x = A _ (m-l)(l-p) \« 
V m q ) 


(4) 


We are led to the following result. 

Corollary 1: Suppose that Ax = b has a unique solution x*. Let A be given by ©. If each of 
the neighbor graphs N(£), t > 1 mentioned in the statement of Theorem Q] is strongly connected, 
then all Xi(t) converge to x* as t —> 0 as fast as A* converges to 0. 


March 4, 2015 


DRAFT 







IEEE TRANSACTIONS ON AUTOMATIC CONTROL, ACCEPTED. 


14 


A proof of this corollary will be given in section IVI-Dl The extension of this result to the case 
when Ax = b has more than one solution can also be worked out, but this will not be done here. 
It is likely that p can be related to a conditioning number for A, but this will not be done here. 

In the consensus literature 1371 . researchers have also looked at algorithms using convex 
combination rules rather than straight average rule which we have exploited here. Applying such 
rules to the problem at hand leads to update equations of the more general form 

Xi(t + 1) = Xi(t) - Pi I Xi(t) - ^2 w ij(t)xj(t) \ i G m (5) 

\ jeA/i(t) / 

where the Wij(t ) are nonnegative numbers summing to 1 and uniformly bounded from below by a 
positive constant. The extension of the analysis which follow to encompass this generalization is 
straightforward. It should be pointed out however, that innocent looking generalizations of these 
update laws which one might want to consider, can lead to problems. For example, problems 
can arise if the same value of Wij is not used to weigh all of the components of agent j’ s 
state in agent z’s update equation. To illustrate this, consider a network with a fixed two agent 
strongly connected graph and A 1 — [1 1 ] and A 2 = [ — a — 1 ]. Suppose agent 1 uses weights 

= .5. to weigh both components of Xj, j G 2 but agent 2 weights the first components of 
state vectors xi and x 2 with .25 and .75 respectively while weighing the second components of 
both with .5. A simple computation reveals that the spectral radius of the relevant update matrix 
for the state of the system determined by © will exceed 1 for values of a in the open interval 
(.5,1). 


VI. Analysis 

In this section we explain why Theorems Q] and [2] are true. As a first step, we translate the 
state Xi of © to a new shifted state e, which can be interpreted as is the error between x t and 
a solution to Ax = 5; as we shall see, this simplifies the analysis. Towards this end, let x* be 
any solution to Ax = b. Then x* must satisfy AiX* = bi for i G m. Thus if we define 

6i{t) = Xi(t) — x*, i6m, f>l (6) 

then ei(t) G Vi, t > 1, because V, = ker A*. Therefore ) = ej(i), i G m, t > 1. Moreover 
from ©, 

e,(i + 1 ) = PfeS) ~ ^ P, 


mi(t)Piei(t ) 


£ 

jeNi{t) 


Pj e j(t) 
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for t > 1, i G m, which simplifies to 


6i(t + 1) 


mflt) 


Pi Pjej(t), t > 1, i G m. 




(7) 


As a second step, we combine these m update equations into one linear recursion equation 
with state vector e(t) = column{ei(f), eflt), ..., e m (f)}. To accomplish this, write A N ( t ) for 
the adjacency matrix of N(7), IJnq) for the diagonal matrix whose zth diagonal entry is mflt ) 
{mflt ) is also the in-degree of vertex i in N(t)}, and let F(t) = D~^A^ t y Note that F(t) 
is a stochastic matrix; in the literature it is sometimes referred to as a flocking matrix. It is 
straightforward to verify that 


e(t + 1) = P(F(t) <8 I)Pe(t), t> 1 (8) 

where P is the mn x mn matrix P = diagonal{Pi, P 2 , ■ ■ ■, P m } an d F(t) ® I is the mn x mn 
matrix which results when each entry f tl (t) of F{t) is replaced by times the n x n identity. 
Note that P 2 = P because each Pi is idempotent. We will use this fact without special mention 
in the sequel. 


A. Repeatedly Jointly Strongly Connected Sequences are Sufficient 

In this section we shall prove Theorem Q] In other words we will show that repeatedly jointly 
strongly connected sequences of graphs are sufficient for exponential convergence of the Xi 
to a solution to Ax = b. We will do this in two parts. First we will consider the special 
case when Ax = b has a unique solution. This case is exactly when Picker A; = 0. Since 
ker Ai — Vi, i 6 m, the uniqueness assumption is equivalent to the condition 

m 

0 = 0 - ( 9 ) 

i— 1 

Assuming Ax = b has a unique solution, our goal is to derive conditions under which e —> 
0 since, in view of ©, this will imply that all x, approach the desired solution x* in the 
limit at t —> oo. To accomplish this it is clearly enough to prove that the matrix product 
( P(F(t ) <8) I)P) ■ ■ ■ (P(F( 2 ) (8) I)P)(P(F( 1 ) <8) I)P) converges to the zero matrix exponentially 
fast under the hypothesis of Theorem [0 Convergence of such matrix products is an immediate 
consequence of the main technical result of this paper, namely Theorem |3] which we provide 
below. 
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To state Theorem [3], we need a way to quantify the sizes of matrix products of the form 
( P(F(t ) 0 I)P) ■ ■ ■ (P(F(2 ) 0 I)P)(P(F( 1) 0 I)P)- For this purpose we introduce a somewhat 
unusual but very useful concept, namely a special “mixed-matrix” norm: Let j • | 2 and | • |oo 
denote the standard induced two norm and infinity norm respectively and write E mnxmn for the 
vector space of all m x m block matrices Q = [ Qij ] whose ijth entry is a matrix Q,j e M nXTl . 
We define the mixed matrix norm of Q £ M"' nXT " n , written ||Q||, to be 

IIQII = |(Q)|oo (10) 

where (Q) is the matrix in K mxm whose ij th entry is \Qij\ 2 . It is very easy to verify that || ■ || 
is in fact a norm. It is even sub-multiplicative {cf. Lemma [3]}. 

To state Theorem (3} we also need the following idea. Let l be a positive integer. A compact 
subset Cofmxra stochastic matrices with graphs in Q sa is l-compact if the set Ci consisting of 
all sequences Si, S 2 , ■ ■ ■, Si, Si £ C, for which the graph 7 (S)S)_ 1 ■ ■ • Si) is strongly connected, 
is nonempty and compact. Thus any nonempty compact subset of m x m stochastic matrices 
with strongly connected graphs in Q sa is 1-compact. Some examples of compact subsets which 
are /-compact are discussed on page 595 of lf35l . 

The key technical result we will need is as follows. 

Theorem 3: Suppose that © holds. Let / be a positive integer. Let C be an /-compact subset 
ofmxm stochastic matrices and define 

A = ( sup sup ,••• sup \\P(Q u i ® I)P(Q u i-i $/)■■■ P{Qi <8> 

where uj = (m— l) 2 and for i e { 1 , 2 ,..., u}, TLi is the subsequence i)z+i, Q(i-i)i+ 2 , ■ ■ ■, Qu- 
Then A < 1, and for any infinite sequence of stochastic matrices Si, S 2 , ■ ■ ■ in C whose graphs 
form a sequence 7(S'i), 7(5 2 ),... which is repeatedly jointly strongly connected by contiguous 
subsequences of length /, the following inequality holds. 

\\P(S t ® i)P(S t -i® I) ■ ■ ■ P(Si® I)P\\ < \d- lu} \ (11) 


The ideas upon which Theorem [3] depends is actually pretty simple. One breaks the infinite 
product 


• • • P(S t <8> I)P{S t -1 0 /) ■ ■ ■ P(Si 0 I)P 
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into contiguous sub-products P(S k i ® I)P(S k i -1 <8> I) ■ ■ ■ P(S k ® I)P, k > 1 of length l with 
l chosen long enough so that each sub-product is a contraction in the mixed matrix norm 
{Proposition [2]}. Then using the sub-multiplicative property of the mixed matrix norm {Lemma 
[3]}, one immediately obtains (fill) . This theorem will be proved in TVI-C31 

Next we will consider the general case in which ([9]) is not presumed to hold. This is the case 
when Ax = b does not have a unique solution. We will deal with this case in several steps. 
First we will {in effect} “quotient out” the subspace fl™, V, thereby obtaining a subsystem to 
which Theorem [3] can be applied. Second we will show that the part of the system state we 
didn’t consider in the first step, satisfies a standard unconstrained consensus update equation to 
which well known convergence results can be directly applied. The first step makes use of the 
following lemma. 

Lemma 1: Let Q' be any matrix whose columns form an orthonormal basis for the orthogonal 
complement of the subspace Dand define Pi = QPiQ' , i E m. Then the following 
statements are true. 

1. Each Pi, i G m, is an orthogonal projection matrix. 

2. Each Pi, i G m, satisfies QPi = P,Q. 

3. nii a = o- 

Proof of Lemma [D Note that Pf = QPiQ'QPiQ' = QPfQ' = QPiQ’ = Pi, i G m, so each P t 
is idempotent; since each Pi is clearly symmetric, each must be an orthogonal projection matrix. 
Thus property 1 is true. 

Since ker Q = it must be true that ker Q C Vi, i G m. Thus P* ker Q = ker Q, i e m. 

Therefore QP, ker Q = 0 so ker Q C ker 0P, . This plus the fact that Q has linearly independent 
rows means that the equation QPi = XQ has a unique solution X. Clearly X = QPiQ', so 
X — P^ Therefore property 2 is true. 

Pick x 6 Cl™ ,Pj. Then x 6 Vi, i E m, so there exist w % such that x=PiWi, i G m. Set 
y = Q'x in which case x = Qy, thus y = Q'PiWi, i G m. In view of property 2 of Lemma [Q 
y = PiQ'wi, i G m so y G C\™ =x V t . Thus Qy = 0. But x = Qy so x = 0. Therefore property 3 
of Lemma |T| is true. ■ 

Proof of Theorem Q} Consider first the case when Ax = b has a unique solution. Thus the 
hypothesis of Theorem [3] that © hold, is satisfied. Next observe that since directed graphs 
in G, sa are bijectively related to flocking matrices, the set Tf of distinct subsequences F((k — 
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1)/ + 1), F((k — 1)1 + 2),..., F(kl ), k > 1, encountered along any trajectory of ([ 8 ]) must be 
a finite and thus compact set. Moreover for some finite integer r 0 > 0, the composed graphs 
7 (F(kl)) o q /(F(lk — 1 ) o • • • F(l(k — 1) + 1)), k > r 0 must be strongly connected because the 
neighbor graph sequence N(f), t > 1 is repeatedly jointly strongly connected by subsequences 
of length l and 7 (F(t)) = N (t), t > 1. Hence Theorem [3] is applicable to the matrix product 
(P(F(t) <£> I)P) ■ ■ ■ (P(F(2)®I)P)(P(F{l)®I)P). Therefore for suitably defined nonnegative 
A < 1, this product converges to the zero matrix as fast as A* converges to 0. This and ([8]) imply 
that e(£) converges to zero just as fast. From this and © it follows that each 27 (f) converges 
exponentially fast to x*. Therefore Theorem Q] is true for the case when Ax = b has a unique 
solution. 

Now consider the case when Ax = b has more than one solution. Note that property 2 of 
Lemma H] implies that QP L Pj = PiPjQ for all i,j G m. Thus if we define e, = Qe-i- i G m, 
then from ([7]) 



( 12 ) 


Observe that (fl2l) has exactly the same form as d7j) except for the P t which replace the P,. 
But in view of Lemma [Q the P t are also orthogonal projection matrices and n”A, V, = 0. Thus 
Theorem [3] is also applicable to the system of iterations (fl2l) . Therefore e* —> 0 exponentially 
fast as t — * 00. 

To deal with what is left, define z t — e t — Q'e j, % G m. Note that Qzi = Qei — e, so 
Qzi = 0, z G m. Thus Zi(t) G Pl^L {Pj, i G m. Clearly PjZi(t) = Zi(t ), i,j G m. Moreover from 
property 2 of Lemma Q] P,Q' = Q'p. These expressions, and (fl2l) imply that 



(13) 


These equations are the update equations for the standard unconstrained consensus problem 
treated in ll35l and elsewhere for case when the 27 are scalars. It is well known that for the 
scalar case, a sufficient condition for all Zi to converge exponentially fast to the same value 
is that the neighbor graph sequence the N(t), t > 1 be repeatedly jointly strongly connected 
lf35tt . But since the vector update ( ITU) decouples into n independent scalar update equations, 
the convergence conditions for the scalar equations apply without change to the vector case as 
well. Thus all z t converge exponentially fast to the same limit in z* G fj'A, V,. So do all of 
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the e, since e, = 2 , + Q'e,. i £ m, and all e, converge to zero exponentially fast. Therefore all 
Xi defined by © converge to the same limit x* + which solves Ax = b. This concludes the 
proof of Theorem Q] for the case when Ax = b does not have a unique solution. ■ 

B. Repeatedly Jointly Strongly Connected Sequences are Necessary 

In this section we shall explain why the of exponential convergence of the Xi(t ) to a solution 
can only occur if the sequence of neighbor graphs N(f), t > 0 referred to in the statement of 
Theorem El is repeatedly jointly strongly connected. To do this we need the following concepts 
from lf38l and ||4T| . A vertex j of a directed graph G is said to be reachable from vertex i if 
either i — j or there is a directed path from 1 to j. Vertex i is called essential if it is reachable 
from all vertices which are reachable from i. It is known that every directed graph has at least 
one essential vertex {Lemma 10 of ll24l }. 

Vertices i and j in G are called mutually reachable if each is reachable from the other. Mutual 
reachability is an equivalence relation on m. Observe that if i is an essential vertex in G, then 
every vertex in the equivalence class of i is essential. Thus each directed graph possesses at 
least one mutually reachable equivalence class whose members are all essential. Note also that 
a strongly connected graph has exactly one mutually reachable equivalence class. 

Proof of Theorem [2} Consider first the case when Ax = b has a unique solution. In this case, the 
unique equilibrium of ® at the origin must be exponentially stable. Since exponential stability 
and uniform asymptotic stability are equivalent properties for linear systems, it is enough to show 
that uniform asymptotic stability of ® implies that the sequence of neighbor graphs N(t), t > 0 
is repeatedly jointly strongly connected. Suppose therefore that © is a uniformly asymptotically 
stable system. 

To show that repeatedly jointly strongly connected sequences are necessary for uniform 
asymptotic stability, we suppose the contrary; i.e. suppose that N(l), N(2 ),... is not a repeatedly 
jointly strongly connected sequence. Under these conditions, we claim that for every pair of 
positive integers / and r, there is an integer k > r such that the composed graph N (k + l — 1) ° 
• • • o N (k + 1) o N(k) is not strongly connected. To justify this claim, suppose that for some pair 
(/, r), no such k exists; thus for this pair, the graphs N(p + / — 1) o • • • oN(p+ 1) oN(p), p > t 
are all strongly connected so the sequence N(l), N(2 ),... must be repeatedly jointly strongly 
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connected. But this contradicts the hypothesis that N(£), £ > 0 is not a repeatedly jointly strongly 
connected sequence. Therefore for any pair of positive integers l and r there is an integer k > r 
such that the composed graph N (k + l — 1 ) o • • • o N (k + 1 ) o N (k) is not strongly connected. 

Let $(£, r) be the state transition matrix of P(F(t)®I)P. Since ([ 8 ]) is uniformly asymptotically 
stable, for each real number e > 0 there exist positive integers t e and T e such that | |<f>(£+T e , £) 11 < 
e for all t > £ e . Set e = 1 and let t 1 and Tj be any pair of such integers. Since N(1),N(2),... 
is not a repeatedly strongly connected sequence, there must be an integer £ 2 > £i for which the 
composed graph 

G = N (£ 2 + Tj — 1 ) o • • • o N (£ 2 + 1) o N(£ 2 ) 


is not strongly connected. Since £2 > £ 1 , the hypothesis of uniform asymptotic stability ensures 
that 

P (£ 2 + Ti,£ 2 )|| <1. (14) 


In view of the discussion just before the proof of Theorem 0 G must have at least one 
mutually reachable equivalence class £ whose members are all essential. Note that if £ where 
equal to m, then G would have to be strongly connected. But G is not strongly connected so £ 
must be a strictly proper subset of m with k < m elements. Suppose that £ = {vi,v 2 ,..., v k } 
and let £ = {ffc+i, • • •, v m } be the complement of £ in m. Since every vertex in £ is essential, 
there are no arcs in G from £ to £. But the arcs of each N(£), £ G {£ 2 , £ 2 + 1,... £ 2 + Tj — 1} 
must all be arcs in G because each N(£) has self-arcs at all vertices. Therefore there cannot be 
an arc from £ to £ in any N(£), £ e {£ 2 , £ 2 + 1,... £ 2 + Tj — 1}. 

Let 7 r be a permutation on m for which n(vj) = j. j 6 m and let Q be the corresponding 
permutation matrix. Then for £ e (£ 2 , £ 2 + l, • • • £ 2 +Tj — 1}, the transformation F(t) 1 —> QF(t)Q' 
block triangularizes F(£). Set Q = Q <g> /. Note that Q is a permutation matrix and that QPQ' 
is a block diagonal, orthogonal projection matrix whose jth diagonal block is P 7I ( V:i ). j G m. 
Because each QF(t)Q' is block triangular, so are the matrices QP(F(t) ( 8 ) I)PQ', £ G (£ 2 , £ 2 + 
1,... £ 2 + Xj — 1}. Thus for £ G {£ 2 , £ 2 + 1,... £ 2 + Tj — 1}, there are matrices A(t), B(t) and 


QP(F(t) ® I)PQ' = 


C(t ) such that 

A(t) B{t) 

0 C{t) 

Let k be the number of elements in £. For £ G {£ 2 , £ 2 + 1,... £ 2 + Tj — 1}, let S{t) be that 
(m — k)x (m — k ) submatrix of F(t) whose ij th entry is the Vi +k Vj + k th entry of F(t), for all and 


March 4, 2015 


DRAFT 




IEEE TRANSACTIONS ON AUTOMATIC CONTROL, ACCEPTED. 


21 


i. j e {1, 2,..., m — k}. In other words, S(t) is that submatrix of F(t) obtained by deleting rows 
and columns whose indices are in £. Since each F(t), t E {t 2 , t 2 + l, ■ ■ ■ t 2 +Ti~l} is a stochastic 
matrix and there are no arcs from £ to E, each corresponding S(t) is a stochastic matrix as 
well. Set P = block diagonal! P Vk+1 , P Vk+2 , • • •, P Vm } in which case C(t ) = P(S(t)®I)P. Since 
(Pi, P 2 ,..., P m } is a non-redundant set and E is a strictly proper subset of m, fj ?;e £ V, 0. Let 
be any nonzero vector in f] ie £ Vi- in which case PiZ — z, i E E. Then C(t)z = P(S(t)®I)Pz 
where z=[z' z' ■■■ zJ]{ m _ k)nxV Note that 

Q$(t 2 + Ti,t 2 )Q' = ( QP{F(t 2 + Ti - 1) ® I)PQ') • • • (QP(F(t 2 ) ® I)PQ') = 

where C = C(t 2 + T\ — 1 ) • • • C(t 2 ). Therefore Cz = z for C has an eigenvalue at 1 . Thus the 
state transition matrix <&(f 2 + Ti — 1 , t 2 ) has an eigenvalue at 1 so ||$(f 2 + Ti — l,f 2 )|| = 1 . 
But this contradicts (fl4l) . It follows that the sequence N(1),N(2),... must be repeatedly jointly 
strongly connected if Ax = b has a unique solution. 

We now turn to the general case in which Ax = b has more than one solution. Since by 
assumption, A ^ 0, the matrix Q defined in the statement of Lemma Q] is not the zero matrix 
and so the subsystem defined by (fl2l) has positive state space dimension. Moreover, exponential 
convergence of the overall system implies that this subsystem’s unique equilibrium at the origin 
is exponentially stable. Thus the preceding arguments apply so the sequence of neighbor graphs 
must be repeatedly jointly strongly connected in this case too. ■ 

C. Justification for Theorem 0 

In this section we develop the ideas needed to prove Theorem [3] We begin with the following 
lemma which provides several elementary but useful facts about orthogonal projection matrices. 


Lemma 2: For any nonempty set of n x n real orthogonal projection matrices {Pi, P 2 ,..., P&} 



PkPk-l ' ‘ ’ -Pi 2 < 1- 

(15) 

Moreover, 

if and only if 

|PfcPfc_l • ■ ■ P11 2 < 1 

k 

(16) 


0 = 0- 
i= 1 

(17) 


A B 
0 C 
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Proof of Lemma [2} To avoid cumbersome notation, throughout this proof we drop the subscript 
2 and write | ■ | for | ■ | 2 . To establish (fl5l) . We make use of the fact that the eigenvalues of any 
projection matrix are either 0 or 1. But the projection matrices of interest here are orthogonal 
and thus symmetric. Therefore each singular value of each Pj must be either 0 or 1. It follows 
that | Pi I < 1, i e k. The inequality in (fl5l) follows at once the fact that | -1 is sub-multiplicative. 

To prove the equivalence of (fl 6 l) and (IT71 ) suppose first that (fT 6 l) holds. Let x be any vector 
in fj ^ =1 Vi. Then P k P k _i ■ ■ ■ P-yx = x. Since (fl 6 l) holds, P k P k ~i ■ ■ ■ P\ must be a discrete time 
stability matrix. Therefore P k P k ~i ■ ■ ■ P\ cannot have an eigenvalue at 1 so x = 0 . It follows 
that m is true. 

To proceed we will first need to justify the following claim: If {Qi,Q 2 i ■ ■ ■ ,Q S } is any 
nonempty subset of s < m projection matrices from {Pi, P 2 ,..., P k } and x £ M” is any vector 
for which \Qi ■ ■ ■ Q s _iQ s x\ = \x\, then QiX = x, i e { 1 , 2 ,..., s}. To prove this claim, suppose 
first that Q £ {Pi, P 2 ,..., P k } and that \Qx\ = |x| for some x £ M n . Write x = y + z where 
y £ Q and z £ Q 1 . Then Qx = y so \y\ = \x\. But \y\ 2 + \z\ 2 = \x\ 2 so 2 = 0. Therefore 
Qx = x so the claim is true for s = 1. 

Now fix q < k and suppose that the claim is true for every value of s < q. Let x be a vector 
for which \Q\ ■ ■ ■Q q Q q+ \x\ = \x\. Then \x\ = \Qi ■ ■ -Q q Qq+ix\ < \Q q+ ix\ < |x| because | • | is 
sub-multiplicative and because (fl5l) holds for any nonempty set of projection matrices. Clearly 
\Q q+ ix\ = |x|; therefore Q q+ \X = x because the claim is true for single projection matrices. 
Therefore Q\ - ■ ■ Q q Q q +\X = Q\ - ■ ■ Q q x so \Qi ■ ■ ■ Q q x\ = |x|. From this and the inductive 
hypothesis it follows that Q^x = x, i £ { 1,2,..., q}. Thus the claim is true for all s < q + 1. 
It follows by induction that the claim is true. 

To complete the proof, suppose that (fTTT) holds and let x be any vector for which P/,P/,_i • • • /{ x\ 
|x|. In view of the preceding claim, PiX = x, i £ { 1,2,..., k}. This implies that x £ n* Vi, and 
thus because of (fT71) that x — 0. Thus P k P k -1 • • • Pi cannot have a singular value at 1. This and 
(fl5l) imply that (fT6l) is true. ■ 

1) Projection Matrix Polynomials: To proceed we need to develop a language for talking 
about matrix products of the form ( P(S q ®I)P )... (P(S 2 ® /)P)(P(S'i® I)P) where the S t are 
mxm stochastic matrices. Such matrices can be viewed as partitioned matrices whose m 2 blocks 
are specially structured n x n matrices. We begin by introducing some concepts appropriate to 
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the individual blocks. 

Let {Pi, P 2 ,..., P m } be a set of n x n orthogonal projection matrices. We will be interested 
in matrices of the form 

d 

lAPli P 2 , P 3 , ■ ■ ■ , Pm) = ^2 ^iPhi(l)Phi(2) ' ' ' (18) 

i —1 

where q t and d are positive integers, A, is a real positive number, and for each j e {1.2,..., q t }, 
hi(j ) is an integer in {1,2,..., m}. We call such matrices together with the n x n zero matrix, 
projection matrix polynomials. In the event p is nonzero, we refer to the A* as the coefficients of 
p. Note that each n x n block of any partitioned matrix of the form ( P(S q <g) 7)P)... ( P(S 2 <E> 
I)P)(P(Si® I)P) is a projection matrix polynomial. The set of projection matrix polynomials, 
written P, is clearly closed under matrix addition and multiplication. Let us note from the triangle 
inequality, that 

d 

IM-Pl) P 2 > P 3, ■ ■ ■ , Pm) | 2 < ^ A i |P fci(1 )P fci ( 2 ) • • • 

Z=1 

From this and (fl5l) it follows that 

\d{Ph Pi, Pzi ■ ■ ■ , Pm) 12 < P 2 , P 3 , ■ ■ ■ , -P m )] (19) 

where f/z(Pi, P 2 , P 3 ,..., P m )~| = Yli=i A,; if p f 0 and f/z“| = 0 if p = 0. We call |"/r] the 
nominal bound of p. Notice that the actual 2 norm of p will be strictly less than its nominal 
bound provided at least one “component” of p has a 2 norm less than one where by a component 
of p we mean any matrix product P/i i (i)P/i i ( 2 ) • ■ ■ Phi( qi ) appearing in the sum in (fl 8 l) which defines 
p. In view of Lemma [2l a sufficient condition for P/ ii (i)P/i i ( 2 ) ■ ■ • Phi( qi ) to have a 2 norm less 
than 1 is that 

<u 

H Im ( P N0)) = 0- 

3 = 1 

Thus if f)™=i Pi = 0’ this in turn will always be true if each of the projections matrices in the 
set {Pi, P 2 ,..., P m } appears in the component P hi (i)P hi ( 2 ) ■ ■ ■ Phi( qi ) at least once. Prompted 
by this we say that a nonzero projection matrix polynomial p(P\, P 2 , P3, ..., P m ) is com¬ 
plete if it has a component P/ li (i)P/ Vi ( 2 ) ■ • • Ph,Xq,) within which each of the projections matrices 
Pj, j G {1, 2,..., m} appears at least once. Assuming fj™ 1 P, : = 0, complete projection matrix 
polynomials are thus a class of projection matrix polynomials with 2-norms strictly less than 
their nominal bounding values. The converse of course is not necessarily so. 
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2) Projection Block Matrices: The ideas just discussed extend in an natural way to “projection 
block matrices.” By an m x m projection block matrix is meant a block partitioned matrix of 
the form 

M = P 2 , • • • 5 Pm) ] m xm ' 

An m x m projection block matrix is thus an nm x nm matrix of real numbers partitioned into 
n x n sub-matrices which are projection matrix polynomials. The set of all m x m projection 
block matrices, written P mxm , is clearly closed under multiplication. Note that any matrix of 
the form ( P(S q <g> I)P )... ( P(S 2 ® I)P)(P(Si ® I)P) is a projection block matrix. 

By the nominal bound of M = [ p t j(P\ . P 2 , ..., Pm) ] mxm e P mxm , written [M], is meant 
the m x m matrix whose zjth entry is the nominal bound of //, ? ( P\ . P 2 . ..., P m ). Using (fl9l) it 
is quite easy to verify that 

(M) < [M] (20) 

where the inequality is intended entry-wise. The definition of nominal bound of a projection 
matrix polynomial implies that for all £ P, \p 1 P 2 ] = \di \ \b 2 ~\ and \pi + p 2 ~\ = \di \ + 

[ p 2 ~\. From this it follows that 

[MiM 2 ] = [Mi] [M 2 ], Mi, M 2 G P mxm . (21) 

In order to measure the sizes of matrices in p mxm we shall make use of the mixed matrix 
norm 11 ■ 11 defined earlier in (fTOl) . A critical property of this norm is that it is sub-multiplicative: 
Lemma 3: 

||AB|| < ||A||||i?||, VA5 6l“. 


Proof of Lemma |3} Note first that 


(AB) = 


1 A ik B k 


kj 12 

k=l j rnxm 


B Ut | Bkj 12 ^ | Aik 121 Bkj 12 so 


\AjkBkj\ 2 < i-Atfchi-Bfcjh — [ 1 12 1A2I2 ••• \a 


im 2 


k =1 


k =1 


| B 

I B. 


ij 12 


2j\2 


I B 


mj 12 
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Clearly (AB) < (A){B). It follows from this and the fact that the infinity norm is sub- 
multiplicative that | (AS)loo < | (A) |ooK^loo Thus the lemma is true. ■ 

It is worth noting that the preceding properties of 11 ■ 11 remain true for any pair of standard 
matrix norms provided both are sub-multiplicative. It is conceivable that the mixed matrix norm 
which results when the 1 -norm is used in place of the 2-norm, will find application in the study 
of distributed compressed sensing algorithms |[42l . The notion of a mixed matrix norm has been 
used before although references to the subject are somewhat obscure. 

Let M = [ IMj ] mxm b e a matrix in P mxm . Since (M) = /A? | 2 ] mxm , it is possible to rewrite 

(l20l) as 

(M) < \M], Me P mxm . (22) 

Therefore 

\\M\ | < | \M] loo, MeF mxm . (23) 

Thus in the case when \M~\ turns out to be a stochastic matrix, \\M\\ < 1. In other words, when 
[M] is a stochastic matrix, M is non-expansive. As will soon become clear, this is exactly the 
case we are interested in. 

What we are especially interested in are conditions under which M is a contraction in the 
mixed matrix norm we have been discussing under the assumption that f]'" , 'P, = 0. Towards 
this end, let us note first that the sum of the terms in any given row i of (M) will be strictly 
less than the sum of the terms in row i of \M] provided at least one sub-matrix jj nj in block 
row i of M is complete. It follows at once that the inequality in (l23l) will be strict if every row 
of M has this property. We have proved the following proposition. 

Proposition 1: Any matrix M in P mxm whose nominal bound is stochastic, is non-expansive 
in the mixed matrix norm. If, in addition, Pll=i 'Pi = 0 an d at least one entry in each block row 
of M is complete, then M is a contraction in the mixed matrix norm. 

3) Technical Results: We now return to the study of matrix products of the form P(S q ® 
I)P(S q _ i (8 )/)••• P(Si <g) I)P where P = diagonal {Pi, P 2 ,..., P m }, S) is an m x m stochastic 
matrix, and / is the n x n identity. As noted earlier, each such matrix product is a projection 
block matrix in P mx "\ Our goal is to state a sufficient condition under which any such matrix 
product is a contraction in the mixed matrix norm. To do this let us note first that 

TP(P ? <8 J)P( V1 ® I) ■ • • P(Si ® /)P] = S q S q -1 • • • S 1 (24) 
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because of ([2Tb and the fact that \P(S®I)P] = S for any stochastic matrix S. Thus in view of 
Proposition Q] P(S q <8) I)P(S q _ 1 ® I) ■ ■ ■ P(S 1 ® I)P will be a contraction assuming © holds, 
if each of its block rows contains an entry which is complete. 

To proceed we need to generalize the idea of a repeated jointly strongly connected sequence 
to sequences of finite length. A finite sequence of graphs Gi, G 2 ,... G^ in Q sa is l-connected if 
the composed graph G; o G/_i o • • • o Gi is strongly connected. More generally, finite sequence 
Gi,G 2 ,...G p is repeatedly l-connected for some positive integer l, if each of the composed 
graphs = Gki°G k i-i°- • -oG( fc _i) i+ i, k e q, is strongly connected; here q is the unique integer 
quotient of p divided by l. Note that if Gi, G 2 ,... G p is such a sequence, the composed graph 
H = GpoGp_io- • -oG;( g _i) + i is also strongly connected because HI = G p oG p _io- • -oG^+ioHIg 
and because in Q sa , the arc sets of any two graphs are contained in the arc set of their composition. 

Proposition 2 : Suppose that © holds. Let Si, S 2 , ■ ■ ■ S p be a finite set ofmxm stochastic 
matrices whose graphs form a sequence 7(Si), 7(62),..., 7 (S p ) which is repeatedly /-connected 
for some positive integer /. If p > (m — 1 ) 2 Z, then the matrix P(S p ®I)P(S p ® /) • • • P(Si®I)P 
is a contraction in the mixed matrix norm. 

To prove this proposition we will make use of the following idea. By a route over a given 
sequence of graphs Gi, G 2 ,..., G q in Q sa is meant a sequence of vertices i 0 , ii ,..., i q such that 
for k q, (i k _ 1, i k ) is an arc in G^. A route over a sequence of graphs which are all the same 
graph G, is thus a walk in G. 

The definition of a route implies that if io,H ,... ,i q is a route over Gi, G 2 ,...,G g and 
i q , i q . |_i,..., i p is a route over G q , G g+ i,..., G p , then the ‘concatenated’ sequence i 0 , q,..., i q -\, 
i q , i q+ 1, ...,i p is a route over Gi, G 2 ,..., G g _i, G q , G 9 + i, ..., G p . This clearly remains true 
if more than two sequences are concatenated. 

Note that the definition of composition in Q sa implies that if j — io, i\, • • •, i q = i is a route 
over a sequence Gi, G 2 ,..., G q , then ( i,j) must be an arc in the composed graph G q o G g _i o 
• • • o Gi. The definition of composition also implies the converse, namely that if (i,j) is an arc 
in G q o G g _i o • • • o Gi, then there must exist vertices i\,..., i q -\ for which j — i 0 , q,..., i q — i 
is a route over Gi, G 2 ,..., G q . 

Lemma 4 : Let Si, S 2 , ■ ■ ■ S q be a sequence ofmxm stochastic matrices with graphs Gi, G 2 ,..., 
G g in Q sa respectively. If j = i 0 ,ii,... ,i q = i is a route over the sequence Gi, G 2 ,..., G g , 
then the matrix product Pi q Pi q _ 1 • • • P 0 is a component of the ij th block entry of the projection 
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block matrix 

M = P(S q ® I)P{Sg -1 ® /) • • • P(S 1 ® /)P. 

Proof of Lemma 0} First suppose q = 1 in which case M = P(Si ® I)P- By definition, (j. i) 
is an arc in Gi; therefore s tJ ^ 0. But the ij th block in M is SijPiPj. Thus the lemma is true 
for q = 1. 

Now suppose that q > 1 and that the lemma is true for all k < q. Set A = PS q P and 
B = P(S q -1 <g) I)P(S q - 2 ® I) ■ --PiS i ® I)P. Since P 2 = P, M = AB. Since the lemma 
is true for k < q and j = i 0 , i k , i 2 ,..., i q -\ is a route over Gi, G 2 ,...,G g _i, the matrix 
Piq-iPig- 2 ''' Pio ' s a component of the projection matrix polynomial entry b, q _ }J of 

B. Similarly, the matrix P lq Pi q -\ is a component of the ii q _1 th projection matrix polynomial 
entry a llq _ 1 of A. In general, the product of any component of any nonzero projection matrix 
polynomial a with any component of any other nonzero projection matrix polynomial /3, is 
a component of the product a/3. It must therefore be true that P^Pi^Pi^Pi^ ■ ■ ■ Pi Q is a 
component of the product au^bi^j. But P 2 q _ 1 = Pi q _ x so Pi q P lq _^ Pi q _ 2 • • • Pi 0 is a component 
of a llq _ 1 b lq _ ir In view of the definition of matrix multiplication, the projection matrix polynomial 
a a q -\ biq—ij must appear within the sum which defines the ij th block entry //,j in M. Therefore 
Pi q Pi q -iPi q - 2 ''' Pq must be & component of Hij. Thus the lemma is true at q. By induction the 
lemma is true for all q > ().■ 

Proof of Proposition [2} Set r = m — 1 and G* = 'y(Si), i G p. Partition the sequence 
Gi, G 2 ,..., G p into r successive subsequences Q\ = {Gi, G 2 ,..., G r ;}, = {G^+i,..., G 2r i}, 

... Q r -1 = {G(( r _ 2 ) W+ i,... G( r _i) ri }, Gr = {G( r _i) ri+ 1 ,..., Gp}, each of length r except for the 
last which must be of length p — l(r 2 — r) > Ir. Each of these r sequences Gi, iGr, consists 
of r successive subsequences which, in turn, are jointly strongly connected. Thus each of the r 

composed graphs = G r ;o- • -oGi, H 2 = G 2r ;o- • -oG r /_)_i, ..., IHT t ._^ = G( r _i) r ;o- • -oG( r _ 2 ) r ;^i, 

H r = GpO • • ■oG( r _i) r i|i can be written as the composition of r strongly connected graphs. But 
the composition of any sequence of r or more strongly connected graphs in Q sa is a complete 
graph {cf. Proposition 4 of ll35ll }. Thus each of the graphs H fc . k e r, is a complete graph. 
Therefore each M k contains every possible arc (i. j). It follows that for any i. j e m and any 
k < r, there must be a route over the sequence Gk from j to i. 
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Let 4,4, ■ ■ ■, 4i be any reordering of the sequence 1,2,... ,m. In the light of the discussion 
in the previous paragraph, it is clear that for each k E {1,2, ...r — 1}, there must be a route 
4 = j(k-i)r,j(k-i)r+i, ■ ■ ■, Jkr = 4+i over g k from i k to i k+1 . Similarly there must be a route 
4 j(r—i)nj(i —i)r+i) • • ■ ijq im from 4 It) i m over Q r . Thus 4 ji,J 2 , • • • ijp im must 
be route over the overall sequence Gi, G 2 ,..., G p . In view of Lemma |U the matrix product 
Pj p ■ ■ ■ Pj 0 must be a component of the of the i rn i , th block entry of 

M = P(S p <g) I)P{Sp-1 <8 )/)■■• P(S 1 0 I)P. 

But 4,4, ■ • • Pm are distinct integers and each appears in the sequence jo, ji, • • •, j P at least 
once. Therefore the z m 4th block entry of M is complete. Since this reasoning applies for any 
sequence of m distinct vertex labels 4,4, • - -, 4i from the set {1, 2 ,..., m}, every block entry of 
M, except for the diagonal blocks, must be a complete projection matrix polynomial. If follows 
from Proposition Q] and (l24l) that M is a contraction. ■ 

Proof of Theorem [3j Let Hi = <3(j-i)z+i, • • •, Qu, * 6 {1,2 ,..., cu}, be any set of cu sequences 
in Ci. Since each Hi E C u each graph jiQiiQu-i ■ ■ ■ Q(i-i)i+i), i E {1,2,..., 02 } is strongly 
connected. Therefore the sequence 'y(Qi), 7 (<? 2 ), • • •, l{Qu,i) 4 repeatedly /-connected. Since 
there are ujI matrices in the Q, - sequence, Proposition [2] applies. Therefore for any set of 
sequences H, E C h i E {1, 2 ,... ,u;}, \\P(Qui <8> /)P(Q^-i <E> I) ■ • • P(Qi <8> I)P\\ < 1. Since 
Ci is compact, A < 1. 

Set M t = P(S t (8) (8 >/)■■■ P(S 1 ® I)P, t > 1 and N k = P(S ulk 0 I)P(S ulk -1 0 

/) • • • P(Suji(k-i)+i ® /ji 3 , for k E q t , where q t is the unique integer quotient of f divided by 
cu/. Since P 2 = P, it must be true that M t = R t N qt N qt -1 • • • iVi where = P(S t 0/)P(S' i _i 0 
/) • • • P(S' 9t z + i 0 IjP. Since the sequences %_i )fc+ i, S , z( i _i) fc+2 ,..., S Hk , i E {1, 2,..., cu}, k E 
q t , are all in C/, it must be true that | Aj.| < A^, k E q,. Therefore ||A/ gt AT gt _ 1 • • • N } \ < X"' lqt 
so \\M t \\ < \ \R t \\X ulqt . But for any mxm stochastic matrix S, ||S'0/|| = 1 because |S'| 00 = 1. 
In addition, ||P|| < 1 because of (fl5l) . From these observations and the fact that || • || is sub- 
multiplicative, it follows that ||P 4 || < 1; thus 

\\M t \\ < \ ullqt . (25) 

Moreover t = ulq t + Pt where p t is the unique integer remainder of t divided by cu/. Thus 
yuiiqt — \t-pt m B u t p t < lu and A < 1 so A^ _pt ^ < It follows from this and (1251) that 

dTTT) is true. ■ 
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D. Convergence Rate 

In this section we will justify the claim that the expression for A given by © is a worst case 
bound on the {geometric} convergence rate for the algorithm © for the case when Ax = b has a 
unique solution and all of the neighbor graphs encountered are strongly connected. To establish 
this claim we will need a lower bound on the coefficients of the nonzero nxn projection matrix 
polynomials which comprise the m x m partition of P(F q 0 I)P(F q _i 0 I) ■ ■ ■ P(F\ 0 I)P . 
The bound is given next. 


Lemma 5: Let s be a positive integer and suppose that the nonzero block projection matrix 

d 

Mij = XkPh k{ l)Ph k{ 2) • • • Ph k (s+1) 

k =1 

is the ij th submatrix within the nm x nm matrix M = P(F S 0 I)P(F s _i 0 I) ■ ■ ■ P(Fi <g> I)P 
where d is a positive integer, each hk(i ) is an integer in m and each X k is a positive number. 
Then 


Xk > — 
m s 


ke d. 


Proof of Lemma |5} We will prove the lemma by induction on s. Suppose first that s — 1. Then 
M = P(F 1 0 I)P and M, ;J = fijPiPj where f tJ is the ijth entry in F,. Since M tJ ^ 0, f iq ^ 0. 
Since F\ is a flocking matrix, each nonzero entry is bounded below by Thus, in this case 
the lemma is clearly true. 

Now suppose that the lemma holds for all s in the range 1 < s < p where p > 1 is an integer. 
Let s — p + 1. Then M = P(F S 0 I)N where N = P(Fj_i 0 J)F(F S _ 2 0 I) ■ ■ ■ P{F\ 0 I)P. 
Thus, for all i,j e m, 

m 

Mij = ^2 fikPiNkj ( 26 ) 

k =1 

where f lk is the ikth entry of F q and N kj is the kj th block entry of N. Each N k] is either the 
nxn zero matrix or a projection matrix polynomial of the form 

C 

= 'y ] XiPhpi)Phi(2) ■ ■ ■ Ph t (p+ 1) 

1=1 
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where c is a positive integer, each p(z) is an integer in m, and for all Z e c. A/ > 0. Thus if 
N k j ^ 0, then A i > because of the inductive hypothesis. From (l26l) . 

m c 

(/ifcA;) P,P M l)P M2 ) ' ' ' Ph t (p-\- 1). 

fc=l /=1 

Since F s is a flocking matrix, either f ik = 0 or f ik > -A, which implies that either f ik A; = 0 
or fik^i > tttt- Since M iq ^ 0, it must therefore be a projection matrix polynomial whose 
coefficients are all bounded below by ^A_. Thus the lemma holds for s = p + 1. By induction, 
the lemma is established and the proof is complete. ■ 

Proof of Corollary Q} To prove this corollary, it is sufficient to show that for any set of q 
flocking matrices p , P 2 .... , F q , the mixed matrix norm of the matrix 

M = P(F q ® I)P{F q _ i <g)/)•■■ P(P, (8) J)P 


satisfies 

|| M ||<1 

where p is given by ([3]). By definition 


(m — 1)(1 — p) 


m q 


\\M\\ = max ( V] |Af 0 -| 2 

i£m \ L ^ 


(27) 


(28) 


where M, ;J is the zjth block entry of M. In view of (1241) . the nominal bound of M is the stochastic 
matrix F q F q _i • • ■ p. Thus 

I AP I 2 < f i:i (29) 


where is the pth entry in F q F q _i • • ■ p. 

Fix ij e m with i 7 ^ j. As noted just at the end of the proof of Proposition [2 each block 
entry of M, except for the diagonal blocks, must be a complete projection matrix polynomial. 
Thus Mij must be a nonzero matrix of form 

d 

Mij = \Ph k (l)Ph k (2) ' ' ' Ph k (q+1) 

k =1 

where d is a positive integer, each \ k is a real positive number, and each h k {i) is an integer 
in m. Completeness also means that for some integer s G d, each of the matrices in the set 
(Pi, P 2 ,..., Pm} appears in the product P hs (i)Ph s ( 2 ) ■ ■ ■ Ph a ( q + 1 ) at least once; consequently 
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Ph s {i)Ph s {2) ■ ■ ■ Ph a (q+ 1) e C SO \Ph a (i)Ph s (2) ■ ■ • Ph s (q+ i)b < P- In addition, Ph k (i)Ph k (2) ■ ■ ■ Ph k ( q + i)h < 

1, feed because of Lemma [2l It follows that 

d 

\Mij\2 < Y. X k \Ph k (l)Ph k (2) ■ ■ ■ Ph k (q+ 1) I 2 


fe=l 


< 


y: A fe |P/ lfc (i)P/i fc ( 2 ) • • • (qr+l) 12 + \s\P haW P ha (2) ■ ■ ■ Ph s (9+1) 12 

A’ = 1, A ^ s 

Afc+ 

A = 1, A ^ s 

d 

^ ^ A k A s (l p). 

k =1 


Recall that X]fc=i Afc is the nominal bound of M tJ \ thus Yli=i A/.' = ./?: / • Meanwhile, by Lemma 
If follows that 


\Mij\ 2 < fij m q(^ A)' 


Now for each i e m, 


I AL/j j- 1 2 + | ALj j 1 2 . 

i=1 J = l,j¥ : i 

From (l29l) and (l30l) it follows that 


2=1 


ffl 2 — ( fa m q P) )+/**• 


3 = 1,3 

Clearly 

m 

AA M ‘A<1- ^ 

i=i 

From this and (1281) it follows that (l27l) is true. ■ 


(m — 1) 


(I-P)- 


(30) 


VII. Tracking 

An especially important consequence of exponential convergence is that it enables a slightly 
modified version of algorithm ([2]) to track the solution to Ax = b with “small error” when A and 
b are changing with time, provided the rates at which A and b change are sufficiently small. In 
the sequel we sketch why this is so for the case when the time-varying equation A(t)x(t) = b(t) 
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has a unique solution for every fixed value of t. We continue to follow the agreement principle 
stated at the beginning of ! fTTTl In particular, suppose that at each time t agent i knows the pair 
(Ai(t + 1), bi(t + 1)) and using it, computes any solution Zi(t ) to Ai(t + l)x = 6j(f + 1) such as 
Ai(t + l)'(Ai(t + 1 )A' i [t + 1 )) _1 6j(f + 1), if Ai(t + 1) has linearly independent rows. If K^t) 
is a basis matrix for the kernel of A^t + 1) and we restrict the updating of x^t) to iterations 
of the form Xi(t + 1) = Zi(t) + Jfj(f)tq(t), t > 1, then no matter what tq(f) is, each Xi(t + 1) 
will satisfy Ai(t + 1 )x*(t + 1) = 6j(f + 1), t > 1. Just as before, and for the same reason, we 
will choose -iq(t) to minimize the difference (zi(t) + Ki{t)ui{t )) — i n the 

least squares sense. Doing this leads at once to an iteration for agent % of the form 


Xi(t + 1 ) = Zi(t) - [ rrii(t)zi(t) - ^ Xj(t) ] ,t > 1 


(31) 




where for each t > 0, P, (t) is the time-varying orthogonal projection on the kernel of A t (t + 1) 
and Xj(l) is a solution to Ai(l)x — bi(l). It is worth noting that even though Zi(t) is not uniquely 
specified here, update rule (OTb is because (/ — Pi{t))zi{t) is independent of the choice of Zj(f), 
just as it was in the time-invariant case discussed earlier. The algorithm just described, differs 
from © in two respects. First the P, are now time dependent and second, instead of using 
to represent a preliminary estimate of the solution to A(t + l)x = bi(t + 1), we use Zi(t ) instead. 
This modification has the advantage of yielding an algorithm which is much easier to analyze 
than would be the case were we to use Xi(t). 

We will assume that A(t) and b(t) are uniformly bounded signals and for simplicity, we will 
further assume that each A t {t) has full row rank for all t; more specifically we will require 
the determinant of to be bounded away from 0 uniformly. We will also assume that 

A(t +1) = A(t) + 8A(t), t > 1 and b(t + 1) = b(t)+8 b (t), t > 1 where S^it) and 8 b (t) are small 
norm bounded signals. Since Pi(t) = I — A[(t + l)(Ai(t + 1 )A! i {t + 1 )) -1 A(£ + 1), Pi(t) will 
be uniformly bounded. Note that it is possible to write P t (t + 1) = P t (t) + E^Sa^ + 1)), t > 0 
where £)(•) is a continuous function satisfying 2^(0) = 0. 

Our goal is to explain why this algorithm can track the unique solutions x*(t) to A(t)x(t) = 
6(f). As a first step, observe that x*(t + 1) = x*(t) — 5(t ) where 5(t) = 5 J 4(f)A _1 (f)6(f) — 
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A 1 (t + l)Sb(t). Clearly 


x*(t + 1) = x*(t) - j—Pi(t) nrii(t)x*(t ) - x*(t) - 5(f) 


(32) 


j£Afi(t) 


for f > 1 because the term in parentheses on the right of this equation is zero. Thus if we define 
the error signal 


e*(f) = Xi(t) — x*(t), i E m, f > 1 


(33) 


then Pj(0)ej(l) = e;(l) and 


e*(f + 1) = (/ - Pi(t))(zi(t) - x*(t + 1)) 4- l —rPi(t) ^ e j{t) + Pi(t)5(t), t> 1. 

mi[t) jem) 


But since both x*(t+ 1 ) and Zi(t) are solutions to Ai(t+l)x = b^t+l), the vector Zi(t)—x*{t+ 1 ) 
is in the kernel of Ai(t + 1); this implies that (/ — Pi(t))(zi(t ) — x*{t + 1)) = 0. It follows that 


+!) = E ej(f) + Pi(t)S(t) t > 1, z G m. 

Hence if we again define e(t) = column{ei(f), e 2 (f),..., e m (t)} there results 
e(t + 1) = P(t)(F(t) <g) /)e(f) + P(t)(l <8) 5(f)), t > 1 


(34) 


where for t > 0, P(f) is the mn x mn matrix P(f) = diagonal{Pi(f), P 2 (f),..., P m (f)}, 1 is the 
m vector of l’s, and and for t > 1, F(t) is the same flocking matrix used earlier. Observe that 
since P 2 (f) = P(f), (l34l) implies that P(f)e(f+1) = e(f+l), t > 1; thus P(f—l)e(f) = e(f), t > 
2. But P(0)e(l) = e(l) because Pj(0)ej(l) = e,(l) as was noted earlier. Therefore P(f—l)e(f) = 
e(f), t > 1. If we define E{t) = diagonal{P 1 (5 J 4(f)), P 2 (5^(f)),..., P m (5,4(f))}, f > 1, then 
E(t) will have a small norm if 5^(f) does. In view of the definition of E(t), P(t ) = P(t — 1) + 
E(t), t > 1. Clearly for t > 1, P(f)e(f) = P(t — l)e(f)+ P(f)e(f) so P(f)e(f) = e(f)+ P(f)e(f). 
Therefore 


e(t + 1) = (P(f)(P(f) O /)P(f) - P(f)(P(f) <8 I)E(t))e(t) + P(f)(l ® 5(f)), f > 1.(35) 
We claim that for |5^(f)|2 sufficiently small for all f, the time varying matrix 

P(f)(F(f) # J)P(f) - P(f)(P(f) <8) /)P(f) 

is exponentially stable assuming the sequence of neighbor graphs N(f), f > 1 satisfies the 
hypotheses of TheoremU] Because |P(f)(P(f) <g)/)F(f)| 2 will be small if 15^4(f)1 2 is, to establish 
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exponential stability, it is sufficient to show that the matrix P(t)(F(t) <£>/)P(£) is exponentially 
stable for |<^ 4 (i)| 2 sufficiently small. To do this it is convenient to first consider the matrix 
M(t,s ) = P(s)(F(£) <E> I)P(s). We know already that for every fixed value of s, the linear 
system z(t + 1) = M(t, s)z(t ) has a unique equilibrium at the origin. In view of Theorem [3] 
we also know that every solution to this equation tends to the origin exponentially fast. In other 
words, for each fixed s, M(t , s) is an exponentially stable time varying matrix. Our goal is to 
show that M(t , t) is exponentially stable as well provided \5a\2 is sufficiently small. While doing 
this is actually a fairly straightforward exercise in linear system theory, it is nonetheless a little 
bit unusual and so for the sake of clarity we will proceed. 

The key fact we will use, which comes from basic Lyapunov theory, is that for every constant 
nm x nm matrix B and every fixed value of s, the matrix 

OO 

T=t 

is a uniformly bounded function of t, where $ s (f, r) is the state transition matrix of M(t,s). 
This is an immediate consequence of exponential stability. It is also true, and is easily verified, 
that L(t, s, B ) satisfies the Lyapunov equation 

L(t, s, B) = M'(t, s)L(t + 1, s, B)M(t, s) + P, t> 1 (36) 

for all s > 0. We use these observations in the following way. 

Let Q(t, s ) = L(t, s, I). Then by a straightforward but tedious computation using (l36l) . 

Q(t, s + 1) - Q(t, s) = A Q (t, s, S A (s)) 

where Ag(f, s, 5 a ) is a bounded function of t and s and a continuous function of 5a satisfying 
Ag(f, s, 0) = 0, t. s > 0. Observe that 

Q(t, s ) = M'(t, s)(Q(t + 1, s + l)M(t , s) + / 

- M\t , s)A q (t, s, 5 A (s))M(t , s). 

Thus if the uniform norm bound on |5^(f)| 2 is small enough, then , f)Ag(f, t, 5 A (t))M(t , t) 

will be positive definite implying that 

Q(t, t) - M\t, t)Q(t + 1 ,t + 1 )M(t, t) 
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is negative definite for all t and thus that z'Q(t, t)z is a valid Lyapunov function for the equation 
z(t + 1) = M(t,t)z(t). Therefore the time varying matrix P(t)(F(t) ® I)P(t ) — P(t)(F(t) <g) 
I)E[t) will be an exponentially stable matrix if the norm bound on 5,\{t) is sufficiently small. 

Of course 5 will be small in norms if both 5a and 5b are. From this and the exponential 
stability of the system (031) . it follows that for sufficiently slow variations in A and b, e will be 
small and in this sense, each of the Xi(t ) will eventually track with small error, the time-varying 
solution x*(t ) to A(t)x*(t) = b(t). Exponential stability is the key property upon which this 
conclusion rests. 

These observations prompt one to ask a number of questions: How small must 5a be for 
tracking to occur and what is the “gain” between the sum of the norms of 5a and 5}, and the 
norm of the tracking error e? In the event that 5a and 5/, can be regarded as solutions to neutrally 
stable linear recursion equations, can the internal model principal ll43l be used to modify the 
algorithm so as to achieve a zero tracking error asymptotically? There are questions for future 
research. 


Example: The following example is intended to illustrate the tracking capability of the algorithm 
just discussed. The equation to be solved is A(t)x(t) = b where for t > 1 


A{t) = 


'2 

3 5 ■ 


■ .1 

.09 

-.24- 

4 

9 -8 

+ sin0.1(t — 1) 

.2 

-.6 

.1 

.1 

5 10. 


..03 

.05 

.4 


and 


b = 

-lo- 

5 

+ sin 0.6(1 — 1) 

-.r 

.2 


.16. 


..3. 


Agent i knows the ith row of the matrix [ A(t) b{t )] at time t — 1 and initiali z es its state .x,(t) 
as follows. 


Xi(l) = 

"11.5" 

-1 

x 2 (l) = 

-1.25- 

0 

, £ 3 ( 1 ) = 

--9- 

1 


. -2 . 


0 


. 2 . 


and Zi(t — 1) = A'(t)(Aj(t)A'(t)) i G 3. A plot of the evolution of the two norm of the 

tracking error e(t) is shown in the following figure. 
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Fig. 1. |e(t)| 2 vs t 


VIII. Asynchronous Operation 

In this section we show that with minor modification, the algorithm we have been studying, 
namely ([2]), can be implemented asynchronously. The relevant update rules are given by (l38l) . 
Since these rules are defined with respect to different and unsynchronized time sequences, for 
convergence analysis one needs to derive a model on which all update rules evolve synchronously 
with respect to a single time scale. Such a model is given by (l39l) . Having accomplished this, we 
then establish the correctness of (l38l) . but only for the case when there are no communication 
delays. The more realistic version of the problem in which delays are explicitly taken into 
account is treated in ifTOll . The ideas exploited there closely parallel those used to analyze the 
asynchronous version of the unconstrained consensus problem treated in ll44l . 

Let t now take values in the real time interval [O.oo). We begin by associating with each agent 
i, a strictly increasing, infinite sequence of event times Li, t l2 .... with the understanding that 
tn is the time agent i initializes its state and the remaining t ik , k > 1 are the times at which 
agent i updates its state. Between any two successive event times t ik and t^k+i), %i{t) is held 
constant. We assume that for any k > 1, Xi(t ) equals its limit from above as t approaches Uk\ 
thus Xi{t ) is constant on each open half interval [Lr-, L(fc+i)), k > 1. 

We assume that for i 6 {1,2,..., m}, agent z’s event times satisfies 

Ti > ti[k + 1 ) — t ik > Ti, k G {1, 2,...} (37) 

where 7) and T, are positive numbers such that % > T,. Thus the event times of agent i are 
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distinct and the difference between any two successive event times cannot be too large. We 
make no assumptions at all about the relationships between the event times of different agents. 
In particular, two agents may have completely different unsynchronized event time sequences. 

We assume, somewhat unrealistically, that at each of its event times agent i is able to 
acquire the state Xj(Uk) of each of its “neighbors” where by a neighbor of agent i at time t,j. 
is meant any agent in the network whose state is available to agent i at time . In the more 
realistic version of the problem treated in iflOll . it is assumed that Xj ( t,k ) is only available to 
agent i after a delay which accounts both for transmission time and the fact that the time at 
which Xj(Uk ) is acquired is typically some time in between f,/, and one of agent z’s subsequent 
event times. There are some subtle issues here in setting up an appropriate model; we refer the 
reader to iflOft for an explanation of what they are and how they are addressed. 

In the sequel, for k > 1 we write J\fi(Uk) for the set of labels of agent i’s neighbors at time 
tik while k = 1 we define Ni(tn) = i. Since agent i is always taken to be a neighbor of itself, 
J\f t (t ik ) is never empty. 

Prompted by ([2]), the update rule for agent i we want to consider for the asynchronous case 


is 


1 )) -Eiitik) / \ Pi ( T -Ejitik) 

mdtik) \ . 


(38) 


where k > 1, and for j e rni(Uk ) is the number of labels in A/"j(f ifc ), and as before, Pi 

is the orthogonal projection on the kernel of Ai. 

To proceed we need a common time scale on which all m agent update rules can be defined. 
For this, let t\ = iriax ; {f,|} and write % for the event times of agent i which are greater than 
or equal to t\. Let T denote the set of all event times of all m agents which are greater than 
or equal to t\. Thus T is the union of the 71- Relabel the times in T as t\, f 2 , • • •, t p ,... so that 
t p < t p+ i for p> 1. We define the extended neighbor set of agent i, written Afi(p), to be Mi{t p ) 
if t p is an event time of agent i. For times t p e T which are not event times of agent i, we 
define A/j(p) = { i }. Doing this enables us to extend the domain of applicability of update rule 
(1381) from T, to all of T. In particular, for p > 1, 




rrii(p) 


Pi mi(p)xi(t p 


X 3 ftp 

j SjV "\ (p) 


(39) 
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where mAp) is the number of indices in Afi(p). The validity of this formula is a simple 
consequence of the assumption that for i e {1,2,..., m}, Xi(t ) is constant on each open half 
interval [t ik , t i(fc+1) ), k > 1. 

Observe that ( l39l) is essentially the same as update rule (121) except that extended neighbor 
sets replace the original neighbor sets. As with the synchronous case, convergence depends on 
connectivity of the graphs determined by the neighbor sets upon which update rules (l39l) depend. 
Accordingly, for each p > 1 we define the extended neighbor graph N(p) to be that directed 
graph in Q sa which has an arc from vertex j to vertex i if j e Af t (p ). The following is an 
immediate consequence of Theorem [Q 

Theorem 4: Suppose each agent i updates its state Xi(t) according to rule (l38l) . Suppose in 
addition that for some positive integer l, the sequence of extended neighbor graphs Nj(p), p > 1 
is repeatedly jointly strongly connected. Then there exists a positive constant A < 1 for which 
all Xi(t p ) converge to the same solution to Ax = b as p —> oo, as fast as A p converges to 0. 
Perhaps of greatest interest is the situation when the original neighbor graph N(t) is independent 
of time. In this case it is possible to address convergence without reference to extended neighbor 
graphs. 

Corollary 2: Suppose that the original neighbor graph N(t) is independent of time and strongly 
connected. Suppose each agent i updates its state tAA) according to rule (1381) . Then there exists 
a positive constant A < 1 for which all Xi(t p ) converge to the same solution to Ax = b as 
p -A oo, as fast as X p converges to 0. 

The proof of Corollary [2] depends on the following lemma. 

Lemma 6: Suppose that the original neighbor graph N (t) is a constant graph N. For i 6 m, 
let Tj be an upper bound on the difference between each pair of successive event times of agent 
i. Then for any pair of event times t a . t b e T satisfying t b — t a > max{Tj, T 2 ,..., T m }, N is a 
spanning subgraph of the composed graph N(b) o N(b — 1) ■ ■ ■ o N(a). 

Proof of Lemma [S Let M, denote the neighbor set of agent i. For % e in, tp 3+ 1) — t r] < T ; < 
h ~ t a , j > 1. Therefore the set {t a ,t( a + 1 ), • • • ,4} must contain at least one event time t p . of 
each agent i. Since NApi) — Af%, i E m, for each j e Af- t there must be an arc from j to i in 
N (p^. It follows from the definition of N, that its arc set must be contained in the union of the 
arc sets of the graphs N(a), N(a + 1),..., N(6). But the arc set of the union of a finite number 
of graphs in Q sa is always a subset of the arc set of their composition lf35l . Therefore the lemma 
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is true. ■ 

Proof of Corollary Set T max = max{fi, f 2 ,..., T m } and T min = min{Ti, T 2 ,..., T m } and 
let q be any positive integer for which T max < qT m in . Let a and b be positive integers satisfying 
b — a = mq. We claim that t b — t a > T max . To prove that this is so, suppose the contrary, namely 
that t b — t a < T max . Then t b — t a < qT min . But for each i e m, T min is no larger than the time 
between any two successive event times of agent i. Thus the closed interval [t a ,t b \ must contain 
at most q event times of agent i. Since there are m agents, \t a , t b \ must contain at most mq event 
times. Therefore b — a < mq which is a contradiction. 

In view of the preceding, t b — t a > T max for any positive integers a and b satisfying b — 
a = mq. Therefore, by Lemma [6l N must be a spanning subgraph of the composed graphs 
N(6)oN(4_i) • • -oN(a) for all such a and b. But N is strongly connected so each such composed 
graph must be strongly connected as well. Therefore the sequence of graphs N(1),N(2),... is 
repeatedly jointly strongly connected by successive subsequences of length mq. From this and 
Theorem 0] it follows that Corollary [2] is true. ■ 

IX. Least Squares 

A limitation of the algorithm we have been discussing is that it is only applicable to linear 
equations for which there are solutions. In this section we explain how to modify the algorithm 
so that it can obtain least squares solutions to Ax = b even in the case when Ax = b does not 
have a solution. As before, we will approach the problem using standard consensus concepts 
rather than the more restrictive concepts based on distributed averaging. To keep things simple, 
we will assume that the A % are full column rank matrices. 

By the least squares solution to Ax = b is meant a value of x for which A!Ax = A'b. As 
is well known, least squares solutions always exist, even if Ax = b does not have a solution. 
It is very easy to verify that a common least squares solution x to all of the agent equations 
AjX = bj, j m will not exist unless Ax = b has a solution. Thus if a decentralized least 
squares solution to Ax = b is to be obtained in accordance with the agreement principle, then 
each agent must solve a different problem. To understand what that problem might be, consider 
for example the situation in which there are three agents. Suppose that the state Xi of agent i is 
augmented with two additional n-vectors, namely y, and and that agents 1, 2 and 3 are tasked 
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to solve the linear equations 


A'\ Mx\ + yi — A[bi 

A2A2X2 + Z2 = A 2 &2 
A' 3 Ax 3 - y 3 - z 3 = A 3 b 3 


respectively. As we will show, it is always possible for the agents to do this and the same 
time to obtain values of the x t . y, and z t for which the three augmented state vectors .x, = 
[ x\ y[ z\ ]’ , 1 e 3 are the same. 

The existence of a vector x — [x r 1 / z’\ for which x; = x, 1 G 3, is equivalent to the 

existence to a solution to the equations A[Aix+y = A\b\, A' 2 Ax+z = A' 2 b 2 , and A 3 A 3 x—y—z = 
A' 3 b 3 . In matrix terms, existence amounts to asking whether or not the equation Mx = q has a 
solution where 



1 

I 

0 ■ 


-A\bA 

M = 

A' 2 A 2 

0 

I 

and q = 

A' 2 b 2 


- A^ 3 A 3 

-I 



-A' ? b 3 . 


Note that by simply adding block rows block rows 1 and 2 of [M q \ to block row 3, one 
obtains the matrix [M q] where 

A\A X I O' 


M = 


A'A 


2 X12 


0 I 


.Al x A\ + A 2 A 2 + A' 3 A 3 0 0. 


and 


Q = 


A\b\ 

A' 2 b 2 


. A/] b\ + ^^2 + A' 3 b 3 _ 

Clearly the set of solutions to Mx = q is the same as the set of solutions to Mx = q because the 
matrices [ M q] and [M q] are row equivalent. It is obvious that M has linearly independent 
columns because A\A\ + A I 2 A 2 + A'^A 3 is nonsingular; therefore M is nonsingular. As a result, 
a solution to Mx = q must exist. Note in addition, that since such a solution must also satisfy 
Mx — q, x must satisfy (A\A\ + A' 2 A 2 + A 3 A 3 )x = A\b\ + A' 2 b 2 + A' 3 b 3 which is the least 
squares equation A'Ax = A'b. Therefore x solves the least squares problem. 

Recall that the idea exploited earlier in the paper for crafting an algorithm for solving Ax = b, 
was that if each agent i were able to compute a solution x, to its own equation AiXi = bi and 
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at the same time all agents were able to reach a consensus in that all x* were equal, then 
automatically each Xi would necessarily satisfy Ax, = b. This led at once to the linear iterations 
© which provide distributed solutions to Ax = b. Since with obvious modification, the same 
idea applies to the least squares problem under consideration here, it is clear that the same 
approach will lead to linear iterations which provide a distributed solution to the least squares 
equation A'Ax = A'b. The update equations in this case are identical with those in £1]) except 
that in place of and Xi and Pi one would use the x % and P, where P % is the orthogonal projection 
matrix on the kernel of the ith block row in M. Under exactly the same the conditions as those 
stated in Theorem [Q the x t so obtained will all converge exponentially fast to the desired least 
squares solution. 


A. Generalization 


The idea just illustrated by example, generalizes in a straight forward way to any m agent 
network. The first step would be to pick any m vertex graph tree graph T and orient it. Agent z’s 
augmented state would then be of the form A = [x- x' :i x' i2 ... x',_ ^ ]' where all x t] e 

M”. Instead of solving AjXj = &*, agent i would be tasked with solving [ A'Aj hi® I]xi — A^bj 
where hi is the zth row of the mx(m-l) incidence matrix of T. At the same time, all m agents 
would be expected to reach a consensus in which all A are equal. Were a consensus reached at 
a value x — [x' y\ y' 2 ... y' m ]', then x would have to satisfy the equation Mx = cj where 



1 

> 

_1 


1 

_i 

M = 

; h®i 

and q = 



- A-m-A-m 


- A-mpm - 


We claim that a solution to Mx = q must exist and that the sub-vector x within x is the solution 
to the least squares problem. To understand why, first note that the block rows of H ® I sum 
to zero because the rows of H sum to zero. Thus if E is product of elementary row matrices 
which adds the first (m — 1) block rows of Pt ® I to the last, then E(H ® I ) must be of the 
form 


E(H®I) 


D 

0 


nmx(m—l)n 


where D is a square matrix. Moreover D must be nonsingular because the rank E(H ® I) = 
rank H ® I and rank H ® I = (m — 1 )n. This last rank identity is a consequence of the fact 
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that the rank of an incidence matrix of an m vertex connected graph, namely rank //, equals 
m — 1 . 

Next observe that the set of solutions to Mx = q is the same as the set of solutions to 

EMx = Eq. But 



- a\a x 



1 - 

r-O 

EM = 

A' A 

D 

and Eq = 

A> h 

■^m— 1 u rn—\ 


A'A 

0 


1 

CP 


Moreover EM is obviously nonsingular so a solution to EMx = Eq and consequently Mx = q 
exists. Note in addition, that since such a solution must also satisfy EMx = Eq, x must satisfy 
A'Ax = A'b. Therefore x solves the least squares problem. 

We have just shown that if each agent i updates its augmented state Xi(t) along a path for which 
[ A\Ai hi® I] Xi(t) = A%, so that x 1 (t) reaches a limit which agrees with the augmented states 
of all other agents, then the limiting value of the sub-vector Xi(t ) will solve the least squares 
problem. The agent update equations for accomplishing this are identical to those in © except 
that in place of and Xi and Pi, agent i would use the x r and P, where Pi is the orthogonal 
projection matrix on the kernel of [A^A * h, ® I \ . Under exactly the same the conditions as 
stated in Theorem Q] the x r so obtained will all converge exponentially fast to the desired least 
squares solution. 

Although the algorithm just described solves the distributed least squares problem, it has 
several shortcomings. First, there must be a network wide design step in which T is specified; 
this conceivably can be accomplished in a distributed manner. Second, the size of the augmented 
state vector of each agent is nm which does not scale well with the number of agents in the 
network. It is possible to significantly improve on the scaling problem if neighbor relations are 
time invariant and there is bi-directional communication between neighbors. How to do this will 
be addressed in another paper. 


X. Concluding Remarks 

In this paper we have described a distributed algorithm for solving a solvable linear equation 
and given necessary and sufficient conditions for it to generate a sequence of estimates which 
converge to a solution exponentially fast. For the case when the equation admits a unique solution, 
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we have derived an expression for a worst case geometric convergence rate. We have shown that 
with minor modification, the algorithm can track the solution to Ax = b if A and b change with 
time, provided the rates of change of these two matrices are sufficiently small. We have show that 
the same algorithm can function asynchronously provided there are no communications delays 
and we have sketched a new idea for obtaining least squares solutions to Ax = b which can be 
used even if Ax = b has no solution. 

We have left a number of issues opened for future research. One is to figure out what the 
relationship is between the parameter p which appears in the convergence rate bound p, and 
a conditioning number of A. Another is to more tightly quantify the relationship between the 
variations in A and b in the event they are time varying, and the tracking error e. Yet another 
is to modify the least squares algorithm discussed in < HXl to reduce the amount of information 
which needs to be transmitted between agents. This last issue will be addressed in a future paper. 
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